The effect of unlabeled data on generative classifiers, with application to model selection

نویسندگان

  • Ira Cohen
  • Fabio G. Cozman
  • Alexandre Bronstein
چکیده

In this paper we investigate the effect of unlabeled data on generative classifiers in semi-supervised learning. We first characterize situations where unlabeled data cannot change estimates obtained with labeled data, and argue that such situations are unusual in practice. We then report on a large set of experiments involving labeled and unlabeled data, and demonstrate that unlabeled data can degrade classification performance when modeling assumptions are incorrect. To improve classification performance, we propose a method to switch assumed model structure based on the effect of unlabeled data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...

متن کامل

Unlabeled Data Can Degrade Classification Performance of Generative Classifiers

This reports analyzes the e ect of unlabeled training data in generative classi ers. We are interested in classi cation performance when unlabeled data are added to an existing pool of labeled data. We show that there are situations where unlabeled data can degrade the performance of a classi er. We present an analysis of these situations and explain several seemingly disparate results in the l...

متن کامل

A New Model Selection Test with Application to the Censored Data of Carbon Nanotubes Coating

Model selection of nano and micro droplet spreading can be widely used to predict and optimize of different coating processes such as ink jet printing, spray painting and plasma spraying. The idea of model selection is beginning with a set of data and rival models to choice the best one. The decision making on this set is an important question in statistical inference. Some tests and criteria a...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Evaluation and selection of sustainable suppliers in supply chain using new GP-DEA model with imprecise data

Nowadays, with respect to knowledge growth about enterprise sustainability, sustainable supplier selection is considered a vital factor in sustainable supply chain management. On the other hand, usually in real problems, the data are imprecise. One method that is helpful for the evaluation and selection of the sustainable supplier and has the ability to use a variety of data types is data envel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002